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Summary 

The complete 9193-nucleotIde sequence of the prob- 
able causative agent of AIDS, lymphadenopathy-asso- 
elated virus (LAV) ( has been determined. The deduced 
genetic structure Is unique: It chows, in addition to the 
retroviral gag, pol, and env genes, two novel open 
reading frames we call Q and F. Remarkably, Q Is lo- 
cated between pol and env and F Is haH-encoded by 
the U3 element of the LTR, These data place LAV apart 
from the previously characterized family of human 
T cell leukemia/rymphoma viruses. 

Introduction 

The recent onset of severe opportunistic infections among 
previously healthy male homosexuals has led to the char- 
acterization of the acquired immune deficiency syndrome 
(AIDS) (Gottlieb et al.. 1981; Masur et al., 1981). The dis- 
ease has spread dramatically, and new high-risk groups 
have been identified: patients receiving blood products, 
intravenous drug addicts, and Individuals originating from 
Haiti and Central Africa (Plot et al.. 1984). AIDS is a fatal 
disease, and there is at present no specific treatment. The 
causative agent was suspected to be of viral origin since 
the epidemiological pattern of AIDS was consistent with 
a transmissible disease, and cases had been reported af- 
ter treatment involving ultrafiltered anti-hemophilia prepa- 
rations (Daly and Scon, 1983). A decisive step in AIDS re- 
search was the discovery of a novel human retrovirus 
called lymphadenopathy-associated virus (LAV) (Barre- 
Sinoussi et al.. 1983). The properties of the virus consis- 
tent with its etiological role in AIDS are: the recovery of 
many independent Isolates from patients with AIDS or 
related diseases (Montagnier et al., 1984); high LAV 
seropositive among these populations (Brun-Vezinet et 
al., 1984); a troplsm and cytopathlc effect in vitro for the 
helper/Inducer T-lymphocyte subset T4 (Watzmann et al., 
1984), also found depleted in vivo. 

Other groups have reported the isolation of human 
retroviruses, the human T ceil leukemia/lymphomartym. 
photropic virus type 111 (HTLV-III) (Popovic et al.. 1984) and 
the AIDS-associated retrovirus (AftV), which display bio- 
logical and sero-epidemtological properties very similar to 
if not identical with those of LAV (Levy et al.. 1984; Popovic 
et al„ 1984; Schupbach et al.. 1984). Both LAV and HTLV- 



III genomes have been molecularly cloned (Alizon et al., 
1984; Hahn el al:, 1984). Their restriction maps show 
remarkable agreement, including a Hind ill restriction site 
polymorphism, bearing in mind the variability of this virus 
(Shaw et at., 1964) and confirming that these two viruses 
represent a single viral lineage. 

In addition to its obvious diagnostic and therapeutic 
potential, the LAV DNA nucleotide sequence is essential 
to an understanding of the genetics and molecular biology 
of the virus and its classification among retroviruses. We 
report here the complete 9193-nucleotide sequence of the 
LAV genome established from cloned proviral DMA. 

Results 

DNA Sequence and Organization of the LAV Genome 
We have reported previously the molecular cloning of both 
cDNA and Integrated proviral forms of LAV (Alizon et al., 
1984). The recombinant phage clones were isolated from 
a genomic library of LAVinfect^d human T-lymphocyte 
DNA partially digested b^fiindjjl- The insert of recom- 
binant phage XJ19 was generated by Hind 111 cleavage 
within the R element of the long terminal repeat (H"R), 
Thus each extremity of the insert contains one part of the 
LTR. We have eliminated the possibility of clustered Hind 
III sites within R by sequencing pan of an LAV cDNA 
clone. pLAV 75 (Alizon et al.. 1984), corresponding to this 
region (data not shown). Thus the total sequence informa- 
tion of the LAV genome can be derived from the JU19 
clone. 

Using the M13 shotgun cloning and dldeoxy chain ter- 
mination method (Sanger et al. 1977), we have deter- 
mined the nucleotide sequence of JU19 insert. The recon- 
structed viral genome with two copies of the R sequence 
is 9193 nucleotides long. The numbering system starts at 
the cap site (see below) of virion RNA (Figure 1). 

The viral (♦) strand contains the statutory retroviral 
genes encoding the core structural proteins (gag), reverse 
transcriptase (pol), and envelope protein (env), and -two 
extra open reading frames (orf ) that we call Q and F (Table 
1), The genetic organization of LAV. 51JR-gag-pol-Q-env- 
is unique. Whereas in all replication-competent 
retroviruses pol and env genes overlap, in LAV they are 
separated by orf Q (192 amino acids) followed by four 
small (O00 triplets) orf. The orf F (206 amino acids) 
slightly overlaps the 3' end of env and is remarkable in that 
it is half-encoded by the U3 region of the LTR. 

Such a structure clearly places LAV apart from previ- 
ously sequenced retroviruses (Figure 2). The (-) strand is 
apparently noncoding. The additional Hind lit site of the 
LAV clone JU81 (with respect to JU19) maps to the appar- 
ently noncoding region between Q and env (positions 
5166-5745). Starting at position 5501 is a sequence 
(AAGCCT) that differs by a single base (underlined) from 
the Hind 111 recognition sequence. It is anticipated that 
many of the restriction site polymorphisms between differ- 
ent isolates will map to this region. . 
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The LTR M tI 

The organisation of a reconstructed LTR and viral Hanking 
elements are shown schematically in Figure a The LTR is 
636 bp long and displays usual features (Chen and Barker. 
19S4): it is bounded by an inverted repeat (5'ACTG) includ- 
ing the conserved TC dinudeotide (Temin, 1981); adjacent 
to 5' LTR is the tRNA primer binding site (PBS), com- 
plementary to tRNA 1 ? (Raba et at., 1979); adjacent to 3' 
LTR is a perfect 15 bp polypurine tract. The other three 



polypurine tracts observed between nucleotides 
6200-8800 are not followed by a sequence that is com- 
plementary to that just preceding the PBS. 

The limits of U5, R, and U3 elements were determined 
as follows. U5 is Ideated Detween PBS and the polyadeny- 
lation site established from the sequence of the 3' end of 
oiigo(dT>primed LAV cONA (Alizon et al., 19B4). Thus US 
is 84 bp long. The length of R+U5 was determined by syn- 
thesizing tRNA-primed LAV cONA. After alkaline hydroly- 
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figure 2. Comparison of me Genome Organization of LAV with Those 
ot Human T Cell LeukemiaA-ymphoma Virus Type 1 (KR*U (Seiki ei 
bL 1953). Moloney Murine Leukemia Vims (MoMuLV) (Sninmck ct al. 
1991). and Rous Sarcoma Virus (RSV) (Schwartz et al., 19B3) 
The positions and sizes of viral genes ere drawn to scale (open boxes) 
and the vtral genomes (RNA forms} are delimited by brackets. 

sis of the primer. FUU5 was found to be 181 ± 1 bp (Fig- 
ure 4). Thus R is 97 bp long and the cap site at its 5' end 
can be located. Finally, U3 is 456 bp long. The LAV LTR 
also contains characteristic regulatory elements: a poly- 
adenylation signal sequence AATAAA 19 bp from the fVUS 
junction, and the sequence ATATAAG, which is very likely 
the TATA box, 22 bp 5' of the cap site. There are no long 
direct repeats within the LTR. Interestingly, the LAV LTR 
shows some similariiies to that of the mouse mammary tu- 
mor virus (MMTV) (Donehower el al. v 1981). They both use 
tRNA*? as a primer for (-) strand synthesis, whereas all 
other exogenous mammalian retroviruses known to date 
use tRNA pra (Chen and Barker, 1994). They possess very 
similar polypurine traas; thai of LAV is AAAAGAAAAGG- 
6GGG while that of MMTV is AAAAAAGAAAAAAGGGGG. 
It is probable that the viral (+) strand synthesis is discon- 
tinuous since the polypurine tract flanking the U3 element 
of the 31TR is found exactly duplicated In the 3' end of orf 
pol, at 4331-4346. In addition. MMTV and LAV are excep- 
tional in that the U3 element can encode an orf. In the 
case of MMTV, U3 contains the whole orf while, in LAV, U3 
contains 110 codons of the 3' half of orf F. 

Viral Proteins 
gag 

Near the 5' extremity of the gag orf Is a "typical" initiation 
codon (Kozak, 1984) (position 336), which is not only the 
first in the gag orf, but the first from the cap site. The 
precursor protein is 500 amino acids long. The calculated 
M f of 55,641 agrees with the 55 kd gag precursor poly- 
peptide (Luc Moniagnier, unpublished resutts). The N- 
termlnal amino acid sequence of the major core protein 
p25, obtained by microsequencing (Genetic Systems, per- 
sonal communication), matches perfectly with the trans- 
lated nucleotide sequence starting from position 732 (see 
Figure 1). This formally makes the link between the cloned 
LAV genome and the immunologically characterized LAV 
p25 protein. The protein encoded 5' of the p25 coding se- 
quence is rather hydrophilic hs calculated M f of 14,866 is 
consistent with that of the gag protein pt& The 3* part of 
the gag region probably codes for the retroviral nucleic 
acid binding proiein (NBP). Indeed, as in HTLV-I (Seiki et 
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Figure & Schematic Representation of Vie LAV Long Terminal Repeat 
(ITS) 

The LTR was reconstructed from the sequence oi U19 by juxtaposing 
the sequences adjacent to ihe Hind 111 cloning sites. Sequencing of 
(>liQC<<fr>prtrned LAV DMA done plAV75 (Ak'zon et el, i9W) rules out 
the possibility of clustered Hind 111 sites In Ihe R region of LW. LTR are 
llmhed by an inverted repeat sequence (IR). Both of the viral elements 
flanking the LTR have been represented as 1RNA primer binding slie 
(PBS) lor S' LTR and potypurV* track (PU) for 3' UR Also ifrficated 
are a putative TATA box. the cap site, polyadenytation signal (AATAAA). 
and polyaoenylation site (CAA). The location of the open reading frame 
F (648 nucleotides) Is snown above the LTR scheme. 

at., 1963) and RSV (Schwartz et al.. 1983), the motif Cys- 
Xa-Cys-XVs-Cys common to all NBP (Oroszlan etal., 1984) 
is found duplicated (nucleotides 1509 and 1572 in LAV se- 
quence). Consistent with Ite function the putative NBP Is 
extremely basic (17% A/g + Lys). 
pot 

The reverse transcriptase gene can encode a protein of up 
to 1003 amino acids (calculated M, * 113,629). Since the 
first methionine codon is 92 triplets from the origin of the 
open reading frame, it is possible that the protein is trans- 
lated from a spliced messenger RNA, giving a gag-pol 
polyprotein precursor. 

The pol coding region is the only one in which signifi- 
cant homology has been found with other retroviral protein 
sequences, three domains of homology being apparent. 
The first Is a very short region of 17 amino acids (starting 
at 1856). Homologous regions are located within the p15 
gag*^ protease (Dittmar and Moelllng, 1978) and a poly- 
peptide encoded by an open reading frame located be- 
tween gag and pol of HTLV-I (Figure 5) (Schwartz et al., 
1983; Seiki et al., 1983). This first domain could thus cor- 
respond to a conserved sequence in viral proteases, its 
different locations within the three genomes may not be 
significant since retroviruses, by splicing or other mecha- 
nisms, express a gag-pol polyprotein precursor (Schwanz 
et al., 1963; Seiki et at., 1983). The second and most ex- 
tensive region of homology (starting at 2048) probably 
represents the core sequence of the reverse transcrip- 
tase. Over a region of 250 amino acids, with only minimal 
insertions or deletions, LAV shows 38% amino acid iden- 
tity with RSV, 25% with HTLV-I, and 21% with MoMuLV 
(Schinnick et al., 1981) while HTLV-I and RSV show 38% 
identity in the same region. A third homologous region is 
situated at the 3' end of the pol reading frame and corre- 
sponds to part of the pp32 peptide of RSV that has ex- 
onuclease activity (Misra et al.. 1982). Once again, there 
is greater homology with the corresponding RSV se- 
quence than with HTLV-I. 
enr 

The env open reading frame has a possible initiator 
methionine codon very near the beginning' (eighth triplet). 
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Figure 4. Synthesis otRNA-Primed LAV cONA to' fl+ US (Sttonp/Swp 
CONA) 

Unas 1 and 2 s ho* two different quantities of cDNA while tones M and 
M' represent martyr?. The strong-stop cONA Is iBl bases long with a 
second, less intense tend an 6(1 The error o< estimsiion is ±1 Dp. This 
maps the major cap site V» the second G residue of the sequence 
CTG£GTCT within the LTR. 24 nucleotides downstream of the WA 
box. This guanos'mc residue Is taken as the first base In the nucleotide 
sequence shown in Figure i. 

If so, the molecular weight of the presumed ertv precursor 
protein (861 amino acids, M, calc s 97376) is consistent 
with the known size of the LAV glycoprotein (110 kd and 
90 kd after giycosidase treatment; Luc Montagnier, unpub- 
lished). There are 32 potential N-oJycosytation sites (Asn- 
X-Ser/Thr), which are overtined in Figure 1. An interesting 
feature of env is the very high number of Trp residues at 
both ends of the protein. There are three hydrophobic 
regions, characteristic of the retroviral envelope proteins 
(Seik! et al., 1983). corresponding to a signal peptide (en- 
coded by nucleotides 5815-5850 bp), a second region 
(7315-7350 bp), and a transmembrane segment (7831- 
7896 bp). The second hydrophobic region (7315-7350 bp) 
is preceded by a stretch rich in Arg + Lys. It is possible 
that this represents a site of proteolytic cleavage, which, 
by analogy with other retroviral proteins, would give an ex- 
ternal envelope polypeptide and a membrane-associated 
protein (Seiki et al,. 1983; Kiyokawa et at., 1984). A striking 
feature of the LAV envelope protein sequence is that the 
region following the transmembrane segment is of un- 
usual length (150 residues). The env protein shows no 
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Figure £ Location of a Shon Svetch of Homology tn the gag-pol Re- 
gion of the LAV. KTW (Seiki ei at. 1963) and RSV (Schwaru et al.. 

1963) Genomes 

Conserved amino acids are bowd. Homologous region- !s by 
the solid bar In the schema. Each virus is organized differently in this 
region out the sequence In the RSV genome maps to piS*°. which 
has a proteose^associated function. 

homology to any sequence in protein data banks. The 
small amino acid motif common to the transmembrane 
proteins of all leukemogenic retroviruses (Ciancioto et at., 
1984) is not present In LAV env. 
QandF 

The location of orf Q is without precedent in the structure 
of retroviruses. Orf F is unique in that it is half-encoded 
by the U3 element of the LTR. Both orf have strong initiator 
codons (Kozak, 1984) near their 5* ends and can encode 
proteins of 192 amino acids (M f calc = 22,487) and 206 
amino acids (M, calc * 23;316). respectively. Both put* 
live proteins are hydrophilic (pQ 49% polar, 15.1% Arg + 
Lys: pF 46% polar. 11% Arg + Lys) and are therefore un- 
likely to be associated directly with membrane. The func- 
tion for the putative proteins pQ and pF cannot be 
predicted, as no homology was found by screening pro- 
tein sequence data banks. Between orf F and the pX pro- 
tein of HTLV-t there is no detectable homology. Further- 
more, their hydrophobicity/hydrophilictty profiles are 
completely different. It is known that retroviruses can 
transduce cellular genes -notably proto-oncogenes 
(Weinberg, 1982). We suggest that orfs Q and F represent 
exogenous genetic material and not some vestige of cellu- 
lar DNA because LAV DNA does not hybridize to the hu- 
man genome under stringent conditions (Alizon et al„ 

1964) , and their codon usage is comparable to that of the 
gag. pol. and env' genes (data not shown). 

Relationship to Other Retroviruses 
Although LAV is both morphologically and biochemically 
(Barre-Sinoussi et al., 1983) distinct to HTLV-I and -II. it re- 
mained possible that its genome was organized in a simi- 
lar manner. The characteristic features of HTLV-I and -II 
genomes, which they share with the more distantly related 
bovine leukemia virus (BLV) (Rice et at., 1984), are not 
observed in the case of LAV These are: a region 3' of 
the envelope gene consisting of a noncoding stretch 
(600-900 bp), followed by a coding sequence of 307-357 
codons (X open reading frame), which may slightly over- 
lap the U3 region of the UR (Seiki et at., 1983; Rice et al., 
1984; Sagata et al.. 1984) and, second, the LTR being 
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Table 2. Comparison of the Site of the LAV UP. and LTR-Relaied 
Element lo Those of Other Retroviruses 
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9 



Adapted from Chen and Barker (1964). 
i «= imperfect match or iracl. 

SNV - spleen necrosis virus (Shlmotohno and Temln. 1692). 



composed of unusually long US and R elements and the 
polyadenylation signal being situated in U3 instead of R 
(Seiki et ai., 1963; Sagata et at., 1 9S4; Shimotohono et at., 
1984). We show here that, in contrast, Ihe 3' end of the LAV 
envelope gene overlaps an open reading frame, termed F. 
that has the coding capacity for 206 amino acids and ex- 
tends within the LTR (110 amino adds are encoded by the U3 
region). The putatively encoded polypeptide (pF), the pri- 
mary structure of which can be deduced, does not show 
any homology with the theoretical X gene products of the 
HTLV7BLV family. Also, the U5 and R elements are shorter 
(Table 2) and the polyadenylation signal is located within R, 
as is the case for all retroviruses except the HTLVTBlV. Ad- 
ditionally. LAV uses tRNA*? as (-) strand primer, as op- 
posed to tRNAP" 8 employed by all other mammalian retro- 
viruses except MMTV (Donehower et al., 1981). Those 
homologies detected between the polymerase and pro- 
tease domains of LAV and HTLV are also found in several 
retroviruses, RSV In particular. 

It has been reported that a cloned HTLV-Ill genome 
hybridizes (T m - 26*C) to sequences in the gag-pol and 
X regions of HTLSW and -II; atthough restriction maps of 
cloned LAV and HTLV-JII show almost perfect agreement 
(Hahn et al., 1964), we were unable to detect any such 
hybridization between LAV and HTLV-ll (T ffl = 55°C) 
(AJizon et al., 1984). Indeed, there is a punctual region of 
homology between LAV and HTLV-I (23/27 nucleotides 
starting at position 1859 in the LAV sequence) but nothing 
significant between the two viruses In the X region of 
HTLV-I. One possible reason for this discrepancy is that 
HTLV-Ill is subtly different from LAV. However it was sub- 
sequently reported that there was very minimal, if any. ho- 
mology between ort X (of HTLV-I) and HTLV-Ill (Shaw et al.. 
1984). 

Discussion 

Regulatory sequences carried by retroviral LTR are be- 
lieved to be involved In specific interactions between the 
viral genome and the host cell (Srinivasan et al., 1984). 
The LTR sequences of LAV are unique among retrovi- 
ruses. That could reflect an original mode of gene ex- 
pression, possibly in relation to particular transcriptional 
factors present in the virus-harboring cell. This hypothesis 
can be tested by studying the regulatory activity of the LAV 



LTR sequences in transient or long-term experiments in- 
volving an indicator gene and different cellular contexts. 

The presence of the Q and F reading frames in addition 
to the conventional gag-pot-env set of genes is unex- 
pected. One should now address the question of their role 
in the viral cycle and pathogenicity by trying to character- 
ize their protein product®. It is tempting to speculate on 
a role of such polypeptides) in T4 cells' mortality, a prob- 
lem that can be studied by designing synthetic peptides 
for antibody production or by using site-directed mutagen- 
esis of Q and F coding regions. 

The peculiar genetic structure of LAV poses the ques- 
tion of its origin. The virus shares common tracts with other 
(apparently unrelated) retroviruses. For instance, the un- 
usually large size of the outer membrane glycoprotein 
(env) and a comparably sized genome are also observed 
in the case of Antiviruses such as Visna (Harris et al., 
1981: Querat et al.. 1984). The presence of a large part of 
the F open reading frame in the LTR, and the use of 
tRNAf as a primer for (-) strand synthesis, is reminis* 
cent of the mouse mammary tumor virus. On the other 
hand, homologies in the pot gene would suggest that the 
LAV is closer to RSV than to any other retroviruses. Obvi- 
ously, no clear picture can be drawn from the DNA se- 
quence analysis as far as phylogeny Is concerned. Thus, 
It may well be that LAV defines a new group of retroviruses 
that have been Independently evolving for a considerable 
period of lime, and not simply a variant recently derived 
from a characterized viral family. Both epidemiology and 
pathogeny of AIOS should be reconsidered with this Idea 
in mind, when trying lo answer such questions as these: 
Are there other human or animal diseases that are as- 
sociated with similarly organized viruses? is there a precur- 
sor to AlDS-associated virus(es) normally present, in la- 
tent form, in human populations? What triggered in this 
case the recent spreading of pathogenic derivatives? 

Experimental Procedures 
M13 Cionfno, tnd Sequencing 

total JU19 DNA was sonicated, treated with the KJenow fragment of 
DNA polymerase plus decxyribomjcteotides (2 nr. 16°C). and fraction- 
ated by agarose gel electrophoresis. Fragments of 300-600 op were 
excised, electrocuted, and purified by Bulip (Schleicher and ScnOU) 
chromatography. DNA was ethanoHwedpltaied using 10 *g dextran 
T40 (Pharmacia) as carrier and Ugated to dephosphorymed, Sma I* 
cleaved Mi3mp6 RF DNA using DNA and Pna Ugases pfi hr t6*C) 
and transtccied into £. cotf strain TG-i. Recombinant tfones were de- 
tected by plaque hybridization using the appropriate "P-labeled LAV 
restriction fragments as probes. Single-sranded templates were pre- 
pared from plaques exhibiting positive hy&rjdixation signals and were 
sequenced by the dideoxy chain termination procedure (Sanger et al M 
1977) using o-»S«dATP (Arnersharn, 400 CVmmel) and buffer gradient 
gets (Siggen et a>., 1963). Sequences **re compiled and analyzed 
using the prog/ams of Staden adapted by B. Caudron lor the mstitut 
Pasteur Computer Center (Staden. 1962). 

Strong-Stop cDNA 

LAV *rioA$ from Infected T lymphocyte (Barre-Sinoussi et al., 1983) 
culture supernatant wet pelleted through a 20V« sucrose cushion and 
the cDNA (-) strand was synthesized as described previously (AJizon 
el al.. 1984) except that no exogenous primer was usee. After alkaJine 
hydrolysis (03 M NaOH. 30 mln. 65°C). neutr&llmiion, and phenol ex* 
Mellon, the cOna was efhanouprecipitaied and loaded onto a 6% 
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icrylamida/8 M me a sequencing gel with sequence ladders as si« 
markers. 
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